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Detection of Planted Solutions 
for Flat Satisfiability Problems 

Quentin Berthet*’^ and Jordan S. Ellenberg 
California Institute of Technology and University of Wisconsin 

Abstract. We study the detection problem of finding planted solutions 
in random instances of flat satisfiability problems, a generalization of 
boolean satisfiability formulas. We describe the properties of random 
instances of flat satisfiability, as well of the optimal rates of detection 
of the associated hypothesis testing problem. We also study the perfor¬ 
mance of an algorithmically efficient testing procedure. We introduce 
a modification of our model, the light planting of solutions, and show 
that it is as hard as the problem of learning parity with noise. This 
hints strongly at the difficulty of detecting planted flat satisfiability for 
a wide class of tests. 
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1. INTRODUCTION 

The rapid growth in many scientific fields of the size of typical datasets, and 
the increasingly complex models that are studied, have naturally brought forth 
the notions of statistical and computational complexity in learning theory. For 
many learning problems motivated by such applications, the algorithmic aspect 
of inference procedures cannot be ignored: it is necessary to consider jointly the 
difficulties posed by the presence of noise or random errors, and by computational 
hardness. 

The problem of understanding the tradeoffs between algorithmic and statis¬ 
tical efficiency, has therefore attracted a lot of interest. A particularly success¬ 
ful approach has been to investigate the links between learning problems that 
naturally arise, inspired by applications, and more abstract problems related 
to random discrete structures, that have been extensively studied in theoretical 
computer science. An hypothesis of [Fei02], based on the hardness of refuting 
satisfiability in random satishability formulas - initially used to prove hardness 
of approximation for several problems - has been used as a primitive to show 
hardness of improper learning [DLSS12, DLSS13, LSSS14]. An hypothesis on 
the planted clique problem has also been used as a primitive to prove compu¬ 
tational limits to inference, initially for sparse principal component detection 
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in [BR13], and subsequently for other problems in high dimensional statistics 
[MW13, Chel3, WBS14, GMZ15, CLR15]. 

The desire to understand barriers to learning that come from randomness and 
computation has therefore brought attention to these problems, and the ques¬ 
tions of learning distributions of their instances, in a computationally efficient 
manner. Examples include [FGR'''13, FPV13, FK14, FPV14], investigating the 
query complexity of statistical algorithms for these problems [Kea98], or [Berl4] 
treating the problem of satisfiability detection as an hypothesis testing problem. 

We consider here a learning problem on sets of flats in F^, shown to be a 
generalization of the fe-SAT problem in n variables. We introduce the A;- FLAT 
problem over sets of m flats of dimension n — k, that are flat satisfiable if they do 
not cover all of F^. This is analogous to satisfiability formulas, that are satisfiable 
if the m clauses do not exclude all the assignments. We also introduce a learning 
problem over these instances. It is formulated as a high-dimensional inference 
problem of hypothesis testing between the uniform distribution and the planted 
distribution, where an unknown assignment of F 2 is made flat-satisfiable. We 
study the optimal rates of detection for this problem, in a minimax sense, based 
on various parameters. We show that the optimal sample size m scales linearly 
with the dimension re. Even if they are derived considering only information- 
theoretic limits, these rates are useful as benchmarks. They give a context to the 
performance of candidate algorithms, and let us see if there is a gap between what 
we are able to achieve and the best possible case. We introduce a polynomial¬ 
time algorithm for a test, inspired by a technique of [AGll], and show that the 
test is successful for a sample of order n^. We discuss how a modification of the 
problem, denoted by lightly planted flat satisfiability - that does not significantly 
alter it from a purely statistical point of view - affects the computational aspects, 
making it as hard as the “Learning Parity with Noise” problem [BKW03]. We 
also show how this result strongly suggests that a wide class of testing methods 
cannot be used for detection of planted solutions for flat satisfiability. 

This article is structured in the following manner: Section 2 is focused on 
the description of these problems. We introduce the /c-FLAT problem, and the 
associated problem of detecting planted solutions. In Section 3, we show that there 
exists a sharp phase transition for flat satisfiability of random instances, with a 
threshold at an explicit constant A in the linear regime m = Are. In Section 4, we 
use this result to derive the optimal rate of detection, with an optimal constant, 
that coincides with the flat satisfiability transition. In Section 5, we show that a 
test that can be computed in polynomial time will be successful with a sample 
size that is polynomial in re. We introduce in Section 6 the problem of detecting 
a lightly planted solution, for which we describe optimal rates of detection, and 
discuss computational aspects. 

2. PROBLEM DESCRIPTION 
2.1 The fc-FLAT problem 

Consider F 2 , the re-dimensional coordinate space on F 2 . We are given V = 
(Ri,..., Vm), a collection of rre flats of dimension n — k, 01 /c-flats on F 2 . We denote 
by A:- FLAT the problem of determining whether there exists an element x £ ¥2 
that is flat satisfying, i.e. that does not lie on any of the Vj, or alternatively, 
whether F 2 = k)jVj. We can define the flats by taking k linearly independent 


DETECTION OF PLANTED SOLUTIONS FOR FLAT SATISFIABILITY PROBLEMS 3 


linear forms ..., and k values , £j^k £ ^ 2 , and having 

Vj = {x ^¥2 : = £j^i , Vi G [A:]} . 

We note that there are many such descriptions for any flat, but choosing the 
^j^i and £j^i uniformly at random does yield the uniform distribution on flats. We 
also note that if we constrain the flats to be coordinate-aligned by taking each 
linear form among the projections on one of the e^s, the Vj can be interpreted 
as satisfiability clauses on k literals, and the set Vi,... ,Vm a satisfiability for¬ 
mula with m clauses: For each x G F 2 , x satisfies the j-th clause if and only if 
X ^ Vj, and satisfies the formula if and only if it the case for all the Vj. The 
set of flat satisfying assignments is therefore F 2 \ ^jVj. The problem described 
above is therefore a generalization of k satisfiability. Thus, the /c-FLAT problem 
is NP-complete for k > 3. 

We denote by 5(1V) the set of flat satisfying elements F 2 \UjVj', and by Z{V) its 
cardinality. We write S and Z when it is not ambiguous. We denote by FLAT the 
set of V that are flat satisfiable, i.e. for which there exists a satisfying element. 
We will consider asymptotics in the linear regime of m = An, for a constant 
A > 0, and m, n —>■ -Foo. 

2.2 Detection of planted flat-satisfiable assignment 

Given a random instance V, our goal is to distinguish two hypotheses for its 
underlying joint distribution. This detection problem is a generalization of the 
problem of detecting planted satisfiability [Berl4]. Under the uniform distribu¬ 
tion (denoted by Punif) the VjS are independent and identically distributed. Their 
distribution is uniform on the set of flats of dimension n—k. A possible way to gen¬ 
erate them is to draw uniformly k linearly independent linear forms ^j^l ,..., (.j^k 
and independently k values £j^i,... ,£j^k £ F 2 , and to define 

Vj- = {x G F2 : ij,i{x) = £jj , Vi G [/c]} . 

Under the planted distribution, (denoted by Ppianted), an element x* G F2 is 
chosen uniformly. Conditioned on this element, the VjS are independent and iden¬ 
tically distributed, with a distribution denoted by Pa,*. Under this distribution, 
they are chosen uniformly on the set of flats of dimension n—k that do not contain 
X*. They can be generated in a similar manner as under the uniform distribution, 
by drawing uniformly k linearly independent linear forms ij^i,... ,ij^ky and the 
k values £jj uniformly among the 2^ — 1 choices that are not all £jj{x*). We 
define Vj similarly. By construction, it does not contain x*, which is a satisfying 
assignment for V. 


Remark 2.1. Let G be the subgroup o/GL„(F 2 ) consisting of linear trans¬ 
formations fixing x*. Then G acts transitively on the k-flats not containing x*. In 
particular, a probability distribution on k-flats which is supported on k-flats not 
containing x*, and which is invariant under G, must be uniform on the k-flats 
not containing x*; in other words, it is the distribution P^,* described above. In 
particular, the procedure of choosing k linear forms and k bits £{ uniformly at 
random subject to the conditions that the ii are linearly independent, and that the 
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(.i{x*)—£i is nonzero for at least one i, is evidently G-invariant; thus, the resulting 
distribution on k-flats is Pa;*. In this paper we will mostly use this deseription 
o/Pa;*. But we want to emphasize that there are many such descriptions, i.e. 
many distributions on k-tuples of pairs {i,£) which yield the distribution Pa;* on 
k-flats. For instance, we could choose the li as above, but then choose an i at 
random, require that iflx*) — Ei = 1, and allow the other k — 1 bits £j to be chosen 
independently at random. Or we could require iflx*) — e* = 1 for all i. Any of 
these processes result in a G-invariant distribution on k-flats not containing x*, 
which can only be Pa;*. 

In order to avoid confusion regarding the representation of these flats, we con¬ 
sider here that the input data is the actual flat, given to us either as a membership 
oracle - a function that returns whether any element of Wlf belongs to the flat Vj 
- or as a uniformly random base ij of the space of linear forms that are constant 
on the flat, and the corresponding values Ej. From a purely statistical point of 
view, this makes no difference; it is equivalent to consider a membership oracle, 
or the finite list of the elements of Vj, or the basis described here. From an algo¬ 
rithmic point of view, we will consider that our data is a uniformly random basis 
of linear forms and the associated values {£j,Ej) for the /c-flat, which has then 
the distribution described above. 

Formally, we denote by qo the uniform distribution on /c-flats of in Vtf, and for 
all X G F 2 by qx the uniform distribution on /c-flats of F 2 , that do not contain 
x. With these notations, the distributions considered in this problem are defined 
thus 

p _p _p _ \ '' p 

•— HQ 7 ^ x,'K ■— Hx 7 .•^planted •— / y ^ ' 

xGFg' 

Our detection problem can be written as testing between two hypotheses 

; P= (Pl,...,K^)~Punif 

Ho • P = (^1) • • • ) ^m) ~ Pplanted • 

3. FLAT-SATISFIABILITY THRESHOLD 

In this section, we study the probability that a uniformly random instance V 
of the /c-FLAT problem is flat satisfiable, when m = An, as a function of A > 0. 
This is achieved by studying the first two moments of Z{V). 

Lemma 3.1. Under the uniform distribution 

B[Z] =2^(l-2-^r. 


Proof. It holds that 

m 

xGFj *=i 

By linearity, symmetry of the distribution, and independence of the Vj, for any 
xo G F^ 

E[Z]=2-(P,,if(xo^Pi)r. 

Furthermore, for each /c-flat of F 2 , |Vi| = 2”“^, which yields the desired result. □ 
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Lemma 3.2. Let V = {Vi ,..., Vm) he a random collection of m k-flats on F2 
with distribution Punif- Let m = An, for some A > 0. We have 


E[Z2] 

E[Z]2 


< 1 + 0 ( 1 ) + 


1 

W]' 


Proof. We derive the second moment of Z 


G 5(P)}l{x'G 5(E)} 

= l{x G 5(E)} + l{x G 5(E)}l{x' G 5(E)} . 

a: xj^x' 

Taking expectation yields 

E[z2] = E[Z] + E Piinif ({x G 5(E)} n{x'G 5(E)}). 

x^x' 


The uniform distribution is invariant under the action of ^+^(^ 2 ), which is dou¬ 
bly transitive on F^. Therefore, the term Punif({x G 5(E)} n {x' G 5(E)}) is 
constant for all couples of distinct elements {x,x') of F^. To compute this distri¬ 
bution, it thus suffices to consider that x and x' are uniformly randomly chosen 
among the set of pairs of distinct elements. For all j G [m], this yields 


Punif({x i E,}n{x' ^ E,}) = 


2^ - 2' 


■n—k rxn 


2n _ ^2n-k _ 


= (l-2-")(l-2-''+ 


2" 2^-1 
Using this in the derivation of the second moment, we have 


t-k 


2 + 2 


-k 


2 " - 1 


E[Z^] = E[Z] + (2^” - 2”)(1 - 2"'')™(1 - 2-'^ + 




2 + 2 


-k 


< E[Z]+ 22’"(1-2"^)2”"(1 + 

< B[Z] + B[Zf(l + 

Note that the last term is a 1 + o(l). 


2 " - 1 

2 + 2 -*^ 1 \rn 

1 _ 2-fc 2^1 _ 1 

2 + 2“*^ 1 N An 


1 _ 2-fc 2n _ 1 


□ 


Together, Lemma 3.1 and 3.2 yield the following 

Theorem 3.3. For k > 0 let Ak := log(l/2)/log(l - 2~^) « 2^1og(2). For 
A > 0, let m = An, and V be uniformly distributed. When m,n ^ + 00 , it holds 
that 

• For A < Ak, PunifiV G FLAT) -)■ 1. 

• For A > Ak, PunifiV G FLAT) 0. 

Proof. We first note that 2(1 — = 1, so that E[Z] = [2(1 — 2“^)^]"’ is 

exponentially large when A < A^, and exponentially small when A > Afc. 


• For A < Ak, Markov’s inequality yields 


P,,„if(E G FLAT) = Punif(^(P) > 1) < nZ] ^ 0 . 
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• For A < Afc, Paley-Zigmund’s inequality and the result of Lemma 3.2 yields 
Punif(^ G FLAT) = Punif(^(P) > 0) > ^ 1 . 

□ 

There is therefore a sharp phase transition in the linear regime, at A^, where 
the limit of the probability of flat satisfiability switches from 1 to 0. This result 
can be compared to the satisfiability transition for k-SAT problems, for which 
Z has the same expectation, but for which the second moment is much larger 
than E[Z]^. The proofs of satisfiability transitions [AP04, COP13, DSS14] are 
therefore much more technical. 


4. OPTIMAL DETECTION FOR PLANTED FLAT-SATISFIABILITY 

One can understand the two distributions by the following generating process. 
Let A4 be the number of subspaces of dimension n — k in . There are therefore 
2^A4 possible /c-flats (equivalent to a choice of linear forms, and k values). Under 
the uniform distribution, m flats are chosen independently and uniformly among 
the 2^Afk possible choices. Under P^,*, there is an excluded choice of values, and 
there are (2^ — 1)A4 allowed flats, among which we draw independently and 
uniformly m flats. This interpretation of the distributions is useful to derive the 
likelihood ratio, in the following. 


Lemma 4.1. Let V = (Ui, ..., Vm) he a collection of m k-flats on F 2 , 

P planted /t 7-\ z{v) 

^ ^ E[Z] • 


Proof. By definition of Ppianted 


Pplanted(U) _ 1 Pa;*(U) 

Punif(U) “^Jj^PuniKU)' 


To compute the probabilities in the above ratios, we use the interpretation above 
of m drawings in A = 2^A4 possible flats independently if the distribution is 
Punif) or otherwise in N* = (2^ — 1)A4 possible choices corresponding to flats 
that do not contain x*. Therefore, it holds for all V 


Px*(U) _ f iJ ^ 

Punif(P) [ (^) otherwise 


Therefore, the likelihood ratio can be expressed in terms of l{x G 5(U)}, and 
N/N* = 1/(1 - 2"^) 


planted 


1 


unif 


N 




xeFii 


1 

Wl 


E 

xeFj 


1{xg5(U)} 


ZjV) 

E[Z] ■ 


□ 
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The distribution Ppianted therefore has a likelihood proportional to ZiV)-. only 
the flat satisfiable V have a positive measure, and those with a large number of 
flat satisfying assignments are more likely to occur. This can be contrasted with 
the uniform distribution on FLAT, for which all flat satisfiable V are equally likely. 
One of the motivations behind the study of this likelihood ratio is its relationship 
with the total variation distance. Indeed, we have 


^^Tv(Punif) Ppianted) — 2^ 



1 1 ^] 
- 2\l E[Z]2 


- 1 . 


The last inequality is a consequence of Jensen’s inequality, and gives a more 
tractable bound on the total variation distance. It is equivalent to considering 
the divergence between the two distributions. When A < A^, Lemma 3.2 
yields 


dTv(Punif, Ppianted) < 2 ^E[Z]2 ^ “ 2 ^ E[Z] ^ ^ ^ ' 

Note that this approach is not fruitful to control the total variation distance 
in the /c-SAT planted satisfiability problem, as E[Z^] is too large, in the linear 
regime of m = An for some constant A > 0. 

For this problem, when A > A^, Punif(Z > 0) < E[Z] —i- 0. Checking flat 
satisfiability, i.e. if Z > 0 is therefore a test with a one-sided probability of error 
equal to Punif(-^ > 0), as we have Ppianted > 0) = 1. Together, these two 
observations yield the following 

Theorem 4.2. For a fixed A > 0, let m = An. The following holds 

• For A > Afc, and V’flat(I^) = ^{Z(V) > 0} 

PunifilpFLAT = 1) V Ppianted(V’FLAT = 0) —)• 0 . 

• For A < Afc, 


inf PnniX’*/’ — 1) ^ Pplantedi'fi — 0) ^ „ • 

Ip Z 

We observe in the statistical problem the same phase transition as in Theo¬ 
rem 3.3: the problem switches at A^ from being insolvable (with a total variation 
distance converging to 0) to the existence of an powerful test, i.e. checking flat 
satisfiability. Note that in this regime, since E[Z] < 1, this test is equivalent to 
the likelihood ratio test Ziy") > E[Z]. 

The picture is clear from the statistical and probabilistic point of view. How¬ 
ever, from a computational point of view, checking if Z is equal to 0 (i.e. if the 
union of flats covers F 2 ) is an NP-complete problem for fe > 3, as fc-SAT is a 
particular case. An interesting question is whether there are detection methods 
that can solve this problem in an algorithmically efficient manner. 
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5. POLYNOMIAL-TIME DETECTION 

We study here the statistical performance of a test that runs in polynomial 
time. We introduce some notations necessary to define this test. Let W be a /c-flat 
of F 2 , defined by k affine constraints 

W = {xeF^ ; £-(x) = e,, Vi G [k]} . 

We make the observation that x does not lie on W if and only if one of the above 
equations is not satisfied, or equivalently, taking aj = 1 — e* 

k 

X Pe,a{x) ■■= {ii{x) + ai) = 0 . 

i=l 

Factoring out, Pi^a can be written as a multivariate polynomial over F 2 of degree 
k 

PeA^) = a) Xs . 

Sc[n] ses 

\S\<k 

Note that all the monomials are squarefree, as z'^ = z for all z G F 2 . Solving 
the /c-FLAT problem is therefore equivalent to solving a system of m polynomial 
equations of degree k. Of course, this is an NP-hard problem. In order to obtain 
a test that is computationally tractable, we lift this system of equations in a 
higher dimensional space to obtain a system of linear equations with quadratic 
constraints, that we will then relax. This general idea is common over reals [ParOl, 
LasOl], and adapted here in a finite field. In this particular context, this approach 
is inspired by [AGll], where this technique is used in a problem of learning with 
errors. 

Let Nk = Aa=o (T) — {n + A, and for x G F 2 , let A G F^* such that 
xs = n g^sXs- We remark that takes the same values as a linear form Ci^a 
over F^*, such that Pe^aix) = CiAX) for the X associated to x, by taking 

c,AX) = E C5(£, a)Xs . 

5c [n] 

|5|<A: 

If we consider the mapping 4> from F 2 to F^*", the so-called Veronese embedding, 
that associates x to X, and V C F^'' the image of cj), it is equivalent to solve 
Pe,a (x) = 0 over all of F 2 and CiAX) = 0 over V. In particular, determining if 
an instance of the /c-FLAT problem is flat satisfiable is equivalent to determining 
if a system of m linear equations in F^ has a solution in V. The image V can 
be written as the intersection of quadratic constraints of the type V|i}Vj 2 } = 
^{ 1 , 2 }) making the system of equations intractable. In order to obtain a tractable 
approximation of this problem, we consider the relaxed linear system of equations, 
by keeping solely the constraint Vg = 1. Formally, for an instance V of the k- 
FLAT problem, we will consider for each flat Vj the associated linear form 
and the overall system Py of m -F 1 linear equations in F^*" 

Av) 


= 0 , Vj G [m] ; Vg = 1. 
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Note that if x* € F 2 is flat-satisfiable for V, the associated X* = (f)(x*) G is 
a solution to Cv; as it is even a solution to the linear system of equations with 
stricter constraint X G V. As a consequence, the system £y always has a solution 
for V ~ Ppianted- However, under the uniform distribution, it is not always the 
case. 

Lemma 5.1. Recall that := log(l/2)/log(l — 2~^) ~ 2^1og(2). Let m = 
AAfc for A > Afc, and V = (Vi,..., Vm) ~ Puni/- The linear system Cy has no 
solutions in F^'', with probability converging to 1 when n —+oo. 

Proof. Consider a fixed Z G F^'‘ such that Z 0 = 1. For an /c-flat W described 
by {I, a), we write La^iiZ) as a function qz^i of a G F^ 

^ cs{l.,a)Zs . 

Sc[n] 

|S|<fe 

We observe that each cs{f^ •) is a multivariate multilinear polynomial (with mono¬ 
mials that are squarefree), so that qz,i = F 2 [ai,... ,ak]- Furthermore, the coef¬ 
ficient of the monomial is = 1 . As the squarefree monomials are 

linearly independent, there exists an element of F 2 such that qz/{ot) 7 ^ 0. There¬ 
fore, as a is uniformly distributed under the uniform distribution qQ, it holds 
that 

T *= 0 ) = Punif( 9 Z,£(«) = 0 ) < 1 - 2 “^ . 

As an aside, note that this bound is tight. Indeed, for all Z G V, the event 
Ca^i{Z) = 0 is equivalent to z ^ W, for z = 4)~^{Z). The probability of this event 
is 1 — 2~^, as seen in the proof of Lemma 3.1. 

Let V = (Vi,..., Vm) ~ Punif- By independence, we obtain directly that 

PunMj,aj(X) = 0 , Vj G [m]) < (1 - 2-^)^ . 

By a union bound over all elements of F^*", it holds that 

Punif(^u has a solution) < 2^''(1 — 2 “^)™ . 

Taking A > A*, yields the desired result. □ 

We consider the test ipc '■ V ^ l{^y has a solution}. When m is of order 
Nk < {n + 1 )^, it is possible to construct and solve the linear system, and thus 
to determine the outcome of the test, in time by Gaussian elimination. 

The result of Lemma 5.1 gives a guarantee, in terms of sample size, about the 
performance of this test. 


Theorem 5.2. Let m = An^, for A > A^. Lt holds that 

Punifidl^C — f ) V Pplantedidl^C — 0) ^ 0 • 
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There are several remarks that one can make about this result. The test ipc 
allows to distinguish the two distributions with probability of error going to 0, 
with computation time and sample size that are both polynomial in n. In par¬ 
ticular, we show that the sample size m needs only to be of order which can 
be compared to results in [AGll], where this linearization procedure is shown to 
recover an analogue to the planted assignment x *, with sample size . The sta¬ 
tistical performance shown here is however suboptimal, and it is not clear whether 
there exists a test that runs in time polynomial in n and that can distinguish the 
two distributions with high probability for a sample size linear in n, the optimal 
regime, that can be seen as a benchmark. 

There are other detection problems for which the optimal regime of detec¬ 
tion is not known to be attainable by algorithmically efficient testing methods. 
In particular, for the planted clique problem [Jer92, Kuc95] in a graph with n 
vertices, even though cliques of size greater than 21 og 2 (re) can be detected or 
recovered, polynomial-time algorithms are only known to be efficient at size y/n 
[AKS98], widely believed to be optimal. This hypothesis has recently been used as 
a primitive to show hardness for other learning problems. This problem, as well 
as those of estimating planted assignments for CSP problems have been stud¬ 
ied, and computational lower bounds shown to exist, in a specific computational 
model [FGR+13, FPV13]. 

A common type of method to solve these detection problems,one that comes 
naturally to mind to find an improved algorithm for this problem - i.e. that 
would need significantly less than samples - is to study the behavior of a 
judiciously chosen, tractable statistic S of the data D. When D is constituted 
of m independent samples, let us consider only S that are sums of statistics p 
of r-tuples of the data, for a finite r. Simply, these approaches revolve around 
showing that S(L)) behaves differently under the two distributions of interest, say 
Euniform[S(.C))] = 0, and Epianted[E(Tl)] = // > 0, and by showing that when the 
sample size is large enough, p is much greater than the typical deviations of S, 
making a test such as such as 1{S(T)) > p/2} powerful. Typical examples include 
statistics based on the degrees of vertices in a graph, bias in signs of literals in a 
CSP, etc. 

This is not the approach used here, where our test is based on the existence of 
an element verifying certain properties - here being a solution to a linear system 
of equations in a finite field - not on summing a certain statistic over i.i.d samples 
(or couples, or triplets of these samples). In the following section, we describe a 
modified version of our hypothesis testing problem, by introducing the model 
of light planting. We show that even though it does not change the statistical 
nature of the problem, it is as hard as the “Learning Parity with Noise” problem, 
strongly suggesting that it cannot be efficiently solved. Therefore, it is highly 
improbable that any method that is robust to this modification - which is often 
true for the approaches based on biases of statistics, as described above - could 
be successful for detection of planted flat satisfiability. 

6. DETECTION OF LIGHTLY PLANTED FLAT-SATISFIABILITY 

We consider a modified version of our hypothesis testing problem. It has the 
same null hypothesis and in the alternative, planting only happens with some 
constant probability vr G (0,1), which we call light planting. Formally, we denote 
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by (?x, 7 r := (1 — the distribution on the flats of dimension n — k that is 

mixture of the uniform qq and of the planting distribution qx, and define similarly 
Px, 7 r and Ppianted, 7 r- As in the planting model with vr = 1 , we have 


P — 

fc,7r • — 


:= < 1 : 




planted,TT 


1 

2 ” 


Ep 

xeFj 


The alternative hypothesis is therefore replaced with ; V = (hi, ..., Vm) ~ 

PplantedjTT' 

6.1 Optimal rates of detection for light planting 


To tackle this problem, we consider for a given set of flats V the following 
statistics 


s(P, x) 


\{j : X i Vj}\ , and a{V) 


maxs(P, x). 
xeFj 


They are respectively the number of flats of V on which x does not lie, and the 
maximum number of flat constraints simultaneously satisfiable by an element of 
¥ 2 - We derive the following deviation bounds for this second statistic under both 
hypotheses. 


Lemma 6.1. For a fixed A > 0, let m = An. It holds that 

P«m/(T(P) > [(1 - 2“^) + a]m) < exp ( - [2a^A - log(2)]n) 
'Ppianted,n{cr{V) < [(1 - 2"^) + tt2~^ - a]m) < exp{-2Q^An). 

Proof. For all x G ¥ 2 , we observe that under the null hypothesis, the variable 
s{x,V) has distribution B{m, 1 — 2“^). Therefore, by Hoeffding’s inequality, 

Punif('S(T, V) > [(1 — 2~^) + a]m) < exp{—2a^m). 

A union bound on F 2 yields 

Punif(T(P) > [(1 — 2~^) + a]m) < 2”exp(—2a^m) < exp ( — [2a^A — log(2)]n) . 

Under P^,* the variable s(x*, V) has distribution B{m, (1 — 2~^) + 7r2“^). By 
Hoeffding’s inequality, 

Px», 7 r('S(x*, V) < [(1 — 2~^) + 7r2“^ — a]m) < exp(—2a^m). 

By definition of Ppianted, 7 r and aiV) > s(x, V) for all x G F 2 , we obtain the 
desired result. □ 


These deviation can be used to prove that a particular test is powerful in the 
linear regime. 

Theorem 6.2. For a fixed A > 0, let m = An, Ak^x: ■= 2^^“^ log( 2 )/ 7 r^ and 
Afc^^ := 2^ log( 2 )/ 7 r^. It holds that 

• For A > Afc^^, and fia{V) = l{a{V) > [(1 — 2“^) + 7 r 2 “(^+^)]m} 

¥*unifi'4^0- — 1) V ¥*planted,(7 — 0) )■ 0 . 
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• For A < Ak,n, 


r*um/(V’ 1) ^planted,TT{'4^ 0) ^ „ ' 

■0 Z 

The first point of this result is a direct consequence of Lemma 6.1. The proof 
of the second point is based on a bound between the divergence of the two 
distributions, similarly to the result in Theorem 4.2. A full proof of the theorem 
can be found in Appendix A. If we consider tt to be a constant, the optimal rate 
of detection for the light planting version of the problem is therefore still in the 
linear regime m = Furthermore the right dependency of A^ on vr is in 

l/vr^, up to constants that only depend on k. 

6.2 Computational aspects of light planting 

The algorithmically efficient testing method described in Section 5 is not robust 
to this modification of the hypothesis testing problem: it relies heavily on the 
fact that for V ~ Ppiantedi there exists some x* that is flat-satisfiable, which 
guarantees in turn the existence of a solution to the linear system Cy- This 
reasoning does not go through under the light planting model. 

We give here strong reasons to believe that improving the result of Theorem 5.2 
- for the case vr = 1 - by using this type of method is hopeless. Our reasoning 
is that such an approach would be robust to light planting, and would allow to 
distinguish Punif and Ppianted, 7 r with sample size and running time polynomial 
in n. The following result shows that this would imply in turn the existence of 
an efficient method for the decision version of the “Learning Parity with Noise” 
(LPN) problem of [BKW03], known to be as hard as the recovery of the “secret” 
signal. This is conjectured to be a hard problem, for which the best algorithms 
run in time and used to prove the safety of cryptography systems 

(see [Piel2], and references within). 

Let (A, b) G be an instance of LPN. For each j G [m], let 7 jp,... 

he k — 1 uniformly random, linearly independent linear forms of F^, themselves 
independent of the linear form ipj generated by Aj. If Aj is uniformly random, the 
n — k dimensional linear subspace of F 2 that is the vanishing set of these k linear 
forms is therefore uniformly random as well. Furthermore, let ... ,/3jx-i be 
k — 1 independent, uniformly random elements of F 2 , independent of bj. Take 
ij^i,... ,£jX be equal to ™ ^ uniformly random order, and 

Sj^i ,..., Ejx be equal to ..., 1 ~ bj in the same order. The equation 

ij{x) = £j defines the n — k dimensional flat Vj. 

Lemma 6.3. Let (A, b) be an instance of LPN, and V the associated instance 
of /c-FLAT obtained by the procedure described above. The following holds 

• If (A, b) are independent and uniformly random, V ~ Punif- 

• If (A,b) is an instance with secret x, and probability of error ij < 1/2, 
V ~ Px,-K, with TT = 1 — 2r]. 

Proof. In all cases, the fe-flats are independent, and the m sets of k linear 
forms are uniformly distributed. If (A, b) is uniformly random, so are the bj, and 
as a consequence, the Sj. This yields the desired V ~ Punif- However, if there 
is a secret x, 4>j{x) = 1 — bj with probability rj. The distribution of 1 — bj — 
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4 >j{x) is therefore is a mixture of the uniform distribution on F 2 (with weight 
1 — vr) and of the unit mass at 1 (with weight vr). The distribution of Sj — 
ij(x) is thus the mixture of the uniform distribution on F 2 (with weight 1 — vr) 
and of the the distribution on F^ \ {0} generated by placing a 1 in one of the 
coefficients of Sj — and letting the others be independent and uniform. As 

shown in Remark 2.1, the resulting flat Vj has distribution and V ~ as 
desired. □ 

From a computational point of view, there is a very strong difference between 
the problems of detecting planted solutions to flat satisfiability, and detecting 
solutions that are only lightly planted, for any constant vr G (0,1). It seems 
impossible to adapt the result of Theorem 5.2 to this new setting, and to describe 
an efficient algorithm that can distinguish these distributions for a sample size of 
order n^/vr^, similarly to the result of Theorem 6.2. 

The testing methods based on simple statistics (i.e. sums of simpler statistics 
that depend on hnite r-tuples of samples) as described in Section 5, are usually 
robust to these modifications. As an example, for the planted clique problem, 
consider a light planting distribution that only plants edges in the small subgraph 
with probability vr. The sum of the degrees of all the vertices has mean 
under the null, and respectively n(n-i) 

planted, or lightly planted distribution. Deviation bounds will therefore show that 
a test based on this statistic will be successful when k > C^/n under the planted 
model and k > Cyn/vr under the lightly planted model, for some constant C > 0. 
The rates of detection for this method are not changed by this modification, for a 
constant vr. The situation is similar for detection of planted satisfiability [Berl4, 
Thm 3.1]: a statistic based on joint signs of variables appearing several times in 
the formula has mean 0 under the uniform distribution, and mean l/[2(2^ — 1)] 
under the planted distribution, and would have mean vr^/[2(2*^ — 1)] under the 
light planting model. The necessary sample size m of order y/n in this problem 
would only be affected in the constant by vr. 

Here, this problem is significantly harder to solve in an algorithmically efficient 
manner when light planting is introduced. Any candidate algorithm to solve the 
planting problem (with vr = 1) would need therefore not be of the type informally 
described above, and need to not be robust to this type of modification in the 
distributions. Indeed, an algorithm robust to light planting that is statistically 
and algorithmically efficient could otherwise be used to solve the LPN problem, 
as shown in Lemma 6.3. 
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APPENDIX A: PROOF OF THEOREM 6.2 

Proof. For A > taking a. = in the results of Lemma 6.1 yields 

the desired upper bound, as 2a^A — log(2) > 0. 

For A < Afc TT, we derive a bound on the total variation distance dTv(Punif) Ppianted.Tr); 
through the inequality 


^Tv(Punif 5 Pplanted, 7 r) 2^ 


planted,TT 


Piir,5 


unif 


(K)-l 


< 


'E 


planted,TT 

P unif 


(K)-l 


2i 


The term inside the square root being equal to the chi-square divergence x^(Ppianted,7r) Punif) 
between the two distributions. We write Pa;, 7 r = and Punif = as products 
of the distribution of each independent Vj. Writing out Ppianted, 7 r as a uniform 
mixture of the P^^jr yields 


X (PplantedjTT) Punif) — 22n ^ y P 

■ E E 


22u 


P x.'K Pa:',7r 


unif -^unif 


A(y) 


^ir,7r Qx^tt 

qo qo 


(Vi) 


-1 


-1 


iEE[(^(u))T + iEE 




do 


x^x' 


dx,TT dx',TT 


do do 


(Pi) 


-1. 


Note that Qx^tt = (1 — 7r)go + T^dx, where qx is the uniform distribution on /c-flats 
that do not contain x (the planting distribution), so that 


dXjTT 

do 


= 1 -|- TT 


dx ^ 

do 


Substituting this in the above yields 


1 


X (PplantedjTT) Punif) 22n ^ ^ (l+ 7 f 


E 


dx 


a;GF5 


L Wo 


(Pi 


2i 


-1 


"■+T 


22n 


x^x' 


E 


qx qx' 
Lgo ^0 


(Pi) 


-1 
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Furthermore, for any fe-flat Vi, it holds that qx/<lo{yi) = (-/V/iVfc)l{x ^ Vi}. We 
give the following upper bound the last two terms of this equation’s RHS, 


— y 

22n / ^ 
x^x' 


(l + ir" 


E 


Lgo qo 



< 


22n 2 ( 1 


2 I ^ Vi) 

+ TT 


(1-2-^)2 


m 


< 

< 


/l-7r^\n/ 7r2 2 + 2^ 1 

V 2 / V l-7r2l-2-*’ 2’"-!/ 

(i + _ 1 

V 2 " - 1 / 


for some constant > 0 (independent of n and tt), by the formula for Punif(x, x' ^ 
Vi) derived in the proof of Lemma 3.2. The last term converges to 0 when n 
+ 00 . We bound as well the first term of the main equation’s RHS 


1 


E 

xGFg- 


l + TT^ 


E 


Qx 

^0 



^2-(l + 7r2(P„,if(x^Ri)-l)r 


< 


I ^ TT^ 

^ V 2*^ - iJ 


Taking A < = 2*’log(2)/7r^ yields 1/2(1 + — 1))^ < 1, and all the 

terms of X^(Ppianted, 7 r, Punif) go to 0 when n +oo. 
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